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§0. Introduction 

Prediction theory for discrete-time stationary stochastic processes, a topic 
in time series, has been long studied and has benefited greatly from the re- 
cent work on orthogonal polynomials on the unit circle (OPUC); see e.g. 
Simon's books [Sil], [Si2], [Si3] for background and references. Here the 
partial autocorrelation function (PACF) plays an important role, as it pro- 
vides an unrestricted parametrization of the relevant spectral measure fi; see 
e.g. the papers by Inoue [Inl], [In2], [In3], Inoue and Kasahara [InKal], 
[InKa2] and the survey by Bingham [Bi] for background and references. The 
PACF is essentially the sequence of Verblunsky coefficients, and the unre- 
stricted parametrization is Verblunsky's theorem, in the language of OPUC; 
see [Sil], Ch. 1,2. 

While this is very satisfactory for univariate time series, one often en- 
counters multivariate time series, particularly in areas such as mathematical 
finance, where the dimensionality i is the number of risky assets held in 
a portfolio; i refiects the need to diversify one's portfolio (in the style of 
Markowitz), and may be large (see e.g. [BiFrKi] for a case in point). Mul- 
tivariate time series form an important area (see e.g. the books by Hannan 
[Ha] and Reinsel [Re]). Likewise, multivariate prediction goes back to work 
by Wiener and Masani [WiMal], [WiMa2], Helson and Lowdenslager [HeLo] 
of 1957-61 (also Masani [Masl] - [Mas5]). Just as in the univariate case with 
OPUC, in the multivariate case the matrix theory of OPUC - MOPUC for 
short below - is crucially relevant. Following the great stimulus to OPUC 
provided by Simon's books, MOPUC has recently been extensively devel- 
oped. Our purpose here is to survey these developments with a view to 
multivariate prediction theory, providing a matrix sequel to [Bi]. 

1. The Kolmogorov Isomorphism Theorem 

Stone's theorem ([Sto], [RiNa] §137, extended to semigroups in [HiPh] 
XXH) tells us that a group U = (Ut) of unitary transformations has a spectral 
representation of the form 



Ut = Je'''E{de), 



where E{.) is a projection-valued random measure. For a stationary stochas- 
tic process X = (Xf) (here time t is discrete and Xf is a random complex 



£- vector), write U for the shift t i— ?> t + 1. Since we can write Xf = U^Xq, 
this gives us a spectral representation for X = (Xf): 



Xt = U'Xo= f e''^'E{de)Xo 



(see e.g. [Rozanov [Ro2], Th. 1.4.2). On the left, we have explicit dependence 
on time t - we are in the time domain. On the right, for fixed t we have de- 
pendence on ^ - we are in the frequency domain. The spectral representation 
may be summarized as 

Xt ^ e'\ 

which expresses the Kolmogorov Isomorphism Theorem (Kolmogorov [Ko] in 
1941; cf. [Ro2], 17, 33, Masani [Mas4] §§6,7). Multiplying these expressions 
for Xs, Xs+t and using stationarity, we get on taking expectations the spectral 
representation of the correlation matrix: 

in6 , 



7n = y e-*"V(rf^), 

where fi is the spectral measure. This is Herglotz's theorem; see e.g. [Ro2], 
19-20. 

§2. Verblunsky's theorem 

In the univariate case, the Verblunsky coefficients a = (a„)$^i satisfy 
|a„| < 1, and the bijection 

a ^ fj, 

of Verblunsky's theorem provides the unrestricted parametrization so useful 
in statistics and prediction theory; see e.g. Pourahmadi [Poul], [Pou2], [Bi] 
§2 for background here. 

In the ^-dimensional case, where one uses MOPUC rather than OPUC, 
the spectral measure fi is now i x i matrix-valued, and corresponds to the 
£ X i covariance matrix by Herglotz's theorem. The Verblunsky coefficients 
have been studied by Damanik, Pushnitski and Simon [DaPuSi], §3. They 
show (3.10) that the Verblunsky coefficients are now i x i matrices on the 
unit circle, satisfying 

l|a„|| < 1, 

that any sequence a of such matrices can arise in this way, and that the map 
a ■^ fi is again a bijection (Verblunsky's theorem for MOPUC - [DaPuSi], 



Th. 3.12). Their method is Bernstein-Szego approximation ([DaPuSi] §3.6; 
cf. [Sil] Th. 1.7.8 in the scalar case). 

Resuhs of this type go back in the statistical literature to Morf, Vieira 
and Kailath [MoViKa] in 1978. They showed that the a„ have singular values 
an with |q;„| < 1. 

The Szego recursion that leads to OPUC is known in the time-series liter- 
ature as the Levinson-Durbin algorithm. The Levinson-Durbin algorithm was 
extended to the multivariate case by Whittle [Wh] in 1963, and by Wiggins 
and Robinson [WiRo] in 1965. In [MoViKa], the authors use the results of 
[Wh] and [WiRo], which they call the LWR algorithm, and give a normalized 
version of it. They show that unless the Levinson-Durbin algorithm stops 
(corresponding to some correlation matrix failing to be positive-definite), 
\an\ < 1 for all n. 

§3. Szego's theorem 

Derevyagin, Holtz, Khrushchev and Tyaglov [DeHoKhTy] continue this 
study, again using Bernstein-Szego approximation (§§5,6). They show (§7) 
that (using f for the adjoint (conjugate transpose), /x' for the density of (the 
absolutely continuous component of) /x, denoted w below) 

logn^iO?et(l-4a„) = ftr Xogfi'dO /2ti = f tr \ogwde/2TT 

for any non-trivial (i.e. of infinite support) matrix-valued probability mea- 
sure on the unit circle. Call those for which the integral on the right is 
> — oo Szego measures, and the finiteness of the integral Szego's condition, 
(Sz). They deduce that /x is a Szego measure iff 

Y, Il4«nll < OO 

("a G L2"). This is Szego's theorem for MOPUC. They continue (§8) with 
the matrix version of the Helson-Lowdenslager theorem: 

exp J -tr log i2'de/2rr = ini a,p J jtrUA + P)trf/i(A + P)], 

where the infimum is over all matrices A of determinant 1 and all trigono- 
metric polynomials P(e*^) = I]A;>o^feC*^^- 

The Wold decomposition extends from the scalar to the vector case. See 



e.g. Masani [Mai], [Mas4], Hannan [Ha], and in the context of operator the- 
ory, Sz.-Nagy and Foias [SzNF], Ch. 1, Nikolskii [Nik] Ch. 1. As in the 
scalar case, (Sz) is the condition that the deterministic component in the 
Wold decomposition should vanish, leaving only the moving-average compo- 
nent. This is the condition of non- determinism, (ND) {"{Sz) = {ND)"). 

As in the scalar case, it is often useful to strengthen the Szego condition 
by requiring explicitly that the singular component fig of the spectral mea- 
sure /i should vanish. This is called the condition of pure non- determinism, 
{PND): 

{PND) = {ND) + {fis = 0} = {Sz) + {/i, = 0}. 

§4. Matrix spectral factorizations and matrix Szego functions 

Factorizations are already present in the scalar case. For an analytic 
function in the Hardy space on the disc, identify the boundary values of the 
function on the unit circle with the function itself, as usual; then the spectral 
density w and the Szego function h are related by 

w = hh= \h\\ 

Here h is in the Hardy space H2, and is an outer function (see e.g. [Bi] for 
references and details); one can think of h as the 'analytic square root' of w. 
In the matrix case, one speaks of the spectral factorization problem. The 
two main traditional approaches to multivariate prediction theory are those 
of Helson and Lowdenslager [HeLo], based on approximation by polynomials, 
and of Wiener and Masani [WiMasl] - [WiMas3] , based on matrix factoriza- 
tion: 

W = GG^, {WM) 

where again G is an outer function in the sense of matrix- valued Hardy spaces 
(see e.g. [Pel3], §13.3 and Appendix 2.3, or [RoRo], Ch. 4,5, for definitions 
and details; cf. Stein [Ste], Ch. III).^ We refer to factorizations of the form 
{WM) as Wiener-Masani factorizations, and to G, the matrix analogue of 
the Szego function h in the scalar case (A in the notation of [Sil]), as a (ma- 
trix) Szego function. Wiener-Masani factorizations are unique up to unitary 
equivalence ([WiMas2], Th. 8.12). 

This work is well summarized in Masani [Mas3]. Here one finds: 



^Wiener and Masani [WiMasS], Def. 3.5 use the term optimal in place of outer. 
Rozanov [Ro2], Th. II. 4. 2 uses the term maximal. 



(i) the Wold decomposition (§4; [WiMasl], Th. 7.11). In the scalar case, the 
deterministic (time-independent) component of Xf corresponds to the sin- 
gular component Hs of the spectral measure, the moving-average component 
to the absolutely continuous component fj,ac in the Lebesgue decomposition. 
In the matrix case, the terms Wold-Zasuhin decomposition and Lebesgue- 
Cramer decomposition are often used; the correspondence is exact in the 
full-rank case (below), but not in general [Masl]; 

(ii) the Kolmogorov Isomorphism Theorem, between the time domain and 
the spectral domain (§§6, 7); 

(iii) Wiener- Masani factorizations {WM) (Th. 9.7 - see also Rozanov [Rol], 
[Ro2]); 

(iv) the matrix extension of Kolmogorov's formula for the one-step prediction 
error (eq. (10.1), the main result of [WiMasl] (Th. 7.10); see also Whittle 
[Wh]); 

(v) convergence of the finite-past predictor to the infinite-past predictor (§13) 
- cf. Baxter's inequality, §6 below). 
To these, we also add 

(vi) Whittle's multivariate extension of the Levinson-Durbin algorithm, men- 
tioned in §2. 
See also [Mas4] for extensive commentary on Wiener's work in this area. 

In the matrix case, one needs to discriminate between the full-rank and 
degenerate-rank cases - where the rank of the spectral density matrix W is 
full {£) or degenerate (m < i). The degenerate-rank case is considered in 
[Ro2], [Mas3], §§11,12, [WiMasS] for i = 2, Matveev [Mat] in the general 
case; we refer there for details. The generic, and easier, case is the full- 
rank case, where F is positive-definite (the contrast between the full- and 
degenerate-rank cases is similar to that arising in, e.g., regression, where one 
encounters multi-collinearity; see e.g. [BiFr], §7.4. 

This interesting work dates from the 1960s, before the work of Fefferman 
and Stein on BMO and of Sarason on VMO. Armed with these. Feller [Fell] 
in 1990 considered matrix spectral factorizations 

w = h*h = h^h^ 

(recall h is determined to within unitary equivalence). He introduced the 
phase function 

u = h*^h^^, 

the analogue of the phase function h/h of §5 in the scalar case. He showed. 



among other results, that the process is completely regular (§10 below) iff 

u e VMO. 

Arov and Dym [ArDy3], §3.16] give matrix factorizations of positive def- 
inite functions into factors from the Nevanlinna class. 

The simplest, and principal, case is that of a purely non-deterministic 
process of full rank. See e.g. several papers by Ephremidze, Janashia and- 
Lagvilada [EpJaLa], [EpLa], [JaLaEp]. In particular, a (matrix) function in 
a Hardy space is outer iff its (scalar) determinant is outer. k. 

More general than the matrix-valued case is the operator-valued case. 
This is important in operator theory, in non-commutative probability theory, 
non-commutative Hardy-space theory, non-commutative martingale inequal- 
ities etc. See e.g. [RoRo], esp. Ch. 6, Curtain and Zwart [CuZw], Barclay 
[Barl], [Bar2], Mei [Mei], Peller [Pell] - [Pel3]. 

§5. The strong Szego theorem 

The strong Szego theorem, as presented in e.g. [Sil] Ch. 6, [Bi] §5, ex- 
tends in full to the matrix case. For a short proof, see Bottcher [Bol]; cf. 
[Bo2], [Bo3], Basor and Widom [BasWi]. A different approach has been given 
more recently by Chanzy [Chal], [Cha2]. 

§6. Baxter's inequality and Baxter's theorem 

Baxter used OPUC in a series of probabilistic papers of 1961-63 ([Baxl] 
- [Bax3]). His results concern, among other things, the weak and strong 
forms of Szego's limit theorem (for Toeplitz determinants), finite and infinite 
Wiener- Hopf equations (in discrete time n = 0,1,2,...: finite with J2k=oj 
infinite with J2T=o)^ ^^^ ^^^ convergence of finite-predictor coefficients (in 
which one is given a finite section of the past of length n) to the correspond- 
ing infinite-predictor coefficients. This last depends on Baxter's inequality 
[Bax3]. Baxter's inequality was used by Simon [Sil], Ch. 5, in his proof that 
the Verblunsky coefficients a G £i iff the correlation function 7 G £i, the 
spectral measure fj, is absolutely continuous, and its density fi' = w is contin- 
uous and positive. Simon calls this result Baxter's theorem (though Baxter 
did not formulate the result in this form, and 'the Baxter-Simon theorem' 
might be better here, but we will follow [Sil]). Perhaps because of this rather 
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involved history, and the fact that Simon's book [Sil] is still comparatively 
recent, there is as yet no matrix extension to Baxter's theorem. We raise 
here the question of obtaining one, and turn now to what is known. 

Baxter's inequality and convergence of finite predictors in the matrix case 
were considered by Masani in 1966 ([Ma3] §13) and by Cheng and Pourah- 
madi [ChPo] in 1993. Theoretical progress in the area since then has been 
extensive, and the question arises of weakening the conditions that they im- 
pose. For more recent developments in the scalar case, see [InKa2]. 

Results on approximation by such finite-section operators have been given 
in great generality by Seidel and Silbermann [SeSi] (see §2.5.4), using Banach- 
algebra techniques (as did Baxter and Simon). 

§7. Nehari sequences and the Levinson— McKean condition 

Nehari's theorem of 1957 states that a Hankel operator is a bounded map 
from I2 on the natural numbers to itself iff the sequence generating it is the 
sequence of negative Fourier coefficients of a bounded function. See e.g. [Sil] 
Th. 6.2.17 ("The modern theory of Hankel operators started with the fol- 
lowing result of Nehari"), Peller [Pel3]. Finding such a generating sequence 
is thus a type of moment problem, and as with other moment problems there 
may be no solution, a unique solution or more than one solution; the moment 
problem is then called insoluble, determinate or indeterminate. The indeter- 
minate case is particularly important; the generating sequence is then called 
a Nehari sequence. This Nehari moment (or interpolation) problem was con- 
sidered by Adamjan, Arov and Krein [AdArKr] in 1968; they described the 
solution set in terms of Sarason's concept of rigidity [Sal]. Rigidity has 
previously been studied in this area in connection with the concept of com- 
plete non- determinism (CND); see Bloomfield, Jewell and Hayashi [BlJeHa]. 
It turns out that (CND) is equivalent to the intersection of past and fu- 
ture property (IPF) [IK2]. It has very recently been shown that both are 
equivalent to the Levinson-McKean property (the name comes from work 
of Levinson and McKean, [LeMcK] p. 105, in continuous time; the name 
is given by analogy in the discrete-time case). Phase functions (§4) play a 
crucial role here; see Kasahara and Bingham [KaBi] for details. 

The question arises of matrix extensions of these results (work in progress). 
The matrix Nehari problem has been considered in detail by Arov and 
Dym [ArDyl], [ArDy2], [ArDy3] Ch. 4, 7, 10 ('strong regularity'). See 
also Dym's review of Peller's book on Hankel operators [Pel3] (MR1949210 



(2004e:47040)). For matrix phase functions, see §4. 

Related to this Nehari problem is the Schur (interpolation) problem, the 
matrix case of which is considered at book length in Dubovoj, Fritzsche and 
Kirstein [DuFrKi]; cf. [ArDy], §7.6. 

§8. Pure minimality 

In the scalar case, pure minimality is characterized by (/is = and) Kol- 
mogorov's condition 1/w G Li, using w for the density of the spectral measure 
/x (now absolutely continuous). This result extends to the multivariate case; 
see Makagon and Weron [MaWe], [Pou3, Th. 8.10]. The spectral density is 
now a matrix W , and its inverse W~^ is integrable. It is thus natural that 
the condition 

should be imposed in studying processes subject to stronger regularity condi- 
tions than pure minimality. We turn below to two such conditions - positive 
angle (§9) and complete regularity (§10). We regard the four conditions in 
§§7-10, which are in increasing order of strength, as intermediate conditions, 
being intermediate between the weaA; conditions (ND), (PND) (Szego condi- 
tion + Us = Q) and the strong conditions (B), (sSz) (Baxter's condition and 
the strong Szego condition). See e.g. [Bi] for the scalar case. 

9. Positive angle and the matrix Muckenhoupt condition 

The Muckenhoupt condition {A2) of analysis is important in many areas; 
see e.g. [Bi] §6.2 for background and references. It occurs in connection 
with the positive angle condition, (-PA), and the conditions of Helson and 
Szego [HeSz] and of Helson and Sarason [HeSa]. Matrix versions of the 
Muckenhoupt condition are considered at length by Arov and Dym [ArDyl] , 
[ArDy3]. For matrix versions of the Helson-Szego condition, see Pourahmadi 
[Poul]. 

Treil and Volberg [TrVol] show that the following matrix Muckenhoupt 
condition is necessary and sufficient for the positive-angle condition {PA) in 
the multivariate case: 

>'"P' «(|^ /,»')'"(^ /,»•")'" II <~. (-42) 

where the supremum is taken over all intervals / of the unit circle. Here the 



condition that W be invertible a.e. (which corresponds to the 'pure' in pure 
minimahty, see §8 above) has to be imposed exphcitly, as noted by Peller in 
his review of [TrVo2] (MR1428818 (99k;42073)). 

As shown in [HeSz], [HeSa], the condition (PA) (and so {A2) by above) 
is equivalent to a condition on the sequence p{n) of regularity coefficients of 
the form 

Pi.) < 1. 

We note that in his review of earlier work on this problem by Makagon, Mi- 
amee and Schroder [MaMiSc] , Pourahmadi says " Attempts to obtain a similar 
result for g-variate stationary sequences have been unyielding" (MR1443841 
(98g:60074)). 

10. Complete regularity 

We turn now to a strengthening of the conditions of §9 above. The process 
is said to be completely regular ii p{n) — )■ as n — )■ cxd; see [IbRo], Ch. 4, 5. It 
was shown by Treil and Volberg [TV2] that complete regularity is equivalent 
to the following strengthening of the Muckenhoupt condition {A2): 

(cf. Peller [Pell], [Pel3]); here as before W~^ G Li needs to be assumed 
explicitly. Note the form ("p(.) — )■ 0, limsup... = 1") of the strengthenings 
here of the conditions ("p(.) < 1, sup ... < 00") of §7 above. 

§11. Hankel operators 

Prediction theory has always involved Toeplitz operators (as in the book 
[GrSz] by Grenander and Szego), and Toeplitz and Hankel operators have 
many links in operator theory. So it is natural that Hankel operators are 
useful in prediction theory. A monograph treatment of Hankel operators is 
given by Peller ([Pe3]; see also the review by Dym cited above in §7). Connec- 
tions of Hankel operators with the matrix Muckenhoupt condition and with 
the matricial Nehari problem are considered by Arov and Dym in [ArDy3], 
Ch. 10, 11. 
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§12. Open questions 

We mention two. 
Question 1. Find the matrix version of Baxter's theorem. 

As mentioned in §6, the matrix version of Baxter's inequahty provides a 
good starting-point. 
Question 2. Find the matrix version of [KaBi]. 

This hinges on solution of the matrix Nehari problem - the step 

V ^ H. 
We hope to return to this elsewhere. 
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